Language Resource Development at DLSU-NLP Lab
نویسنده
چکیده
In 2003, the Department of Science and Technology awarded a 5million-peso grant to De La Salle University for the development of an EnglishFilipino Machine Translation System. Faced with limited resources for the Filipino language, the team has to build language resources and develop language tools in order to complete the system. This paper presents the different resources and tools created and built by the team and the successive improvements on these projects after 2006.
منابع مشابه
A Survey of NLP Methods and Resources for Analyzing the Collaborative Writing Process in Wikipedia
With the rise of the Web 2.0, participatory and collaborative content production have largely replaced the traditional ways of information sharing and have created the novel genre of collaboratively constructed language resources. A vast untapped potential lies in the dynamic aspects of these resources, which cannot be unleashed with traditional methods designed for static corpora. In this chap...
متن کاملUIMA-Based JCoRe 2.0 Goes GitHub and Maven Central ― State-of-the-Art Software Resource Engineering and Distribution of NLP Pipelines
We introduce JCORE 2.0, the relaunch of a UIMA-based open software repository for full-scale natural language processing originating from the Jena University Language & Information Engineering (JULIE) Lab. In an attempt to put the new release of JCORE on firm software engineering ground, we uploaded it to GITHUB, a social coding platform, with an underlying source code versioning system and var...
متن کاملParticipation in Language Resource Development and Sharing
Language resources are really much required for understanding and modeling the language in the present approaches. The language that has a rich language resource gains a big benefit in making a big advance in language processing. On the other hand, the less resource language is struggling with preparing a large enough language resource such as raw text or annotated corpora. It is a labor intens...
متن کاملA Semi-Automatic, Iterative Method for Creating a Domain-Specific Treebank
In this paper we present the development process of NLP-QT, a question treebank that will be used for data-driven parsing in the context of a domain-specific QA system for querying NLP resource metadata. We motivate the need to build NLP-QT as a resource in its own right, by comparing the Penn Treebank-style annotation scheme used for QuestionBank (Judge et al., 2006) with the modified NP annot...
متن کاملTowards Enhanced Interoperability for Large HLT Systems: UIMA for NLP
We introduce JCORE, a full-fledged UIMA-compliant component repository for complex text analytics developed at the Jena University Language & Information Engineering (JULIE) Lab. JCORE is based on a comprehensive type system and a variety of document readers, analysis engines, and CAS consumers. We survey these components and then turn to a discussion of lessons we learnt, with particular empha...
متن کامل